A Human-machine Comparison in Speech Recognition Based on a Logatome Corpus
نویسندگان
چکیده
In this study, a fair comparison of human and machine speech recognition is established by using the same paradigms for human speech recognition (HSR) and automatic speech recognition (ASR). In order to ensure equal conditions, a speech database specifically designed for this task is used. The results for HSR and ASR are broken down into several intrinsic variabilities like speaking rate, speaking effort and dialect. Across all conditions, ASR error rates are at least 300 % higher than those of humans, even though no contextual knowledge can be exploited. A more detailed analysis of errors in HSR and ASR is carried out by decomposing speech into its phonetic features like voicing or manner and place of articulation. Confusion matrices for these features show that voicing information is crucial to distinguish between certain consonants. The most prominent features for ASR often neglect voicing information, which might contribute to the large gap in performance between HSR and ASR.
منابع مشابه
Oldenburg logatome speech corpus (OLLO) for speech recognition experiments with humans and machines
This paper introduces the new OLdenburg LOgatome speech corpus (OLLO) and outlines design considerations during its creation. OLLO is distinct from previous ASR corpora as it specifically targets (1) the fair comparison between human and machine speech recognition performance, and (2) the realistic representation of intrinsic variabilities in speech that are significant for automatic speech rec...
متن کاملA Database for Automatic Persian Speech Emotion Recognition: Collection, Processing and Evaluation
Abstract Recent developments in robotics automation have motivated researchers to improve the efficiency of interactive systems by making a natural man-machine interaction. Since speech is the most popular method of communication, recognizing human emotions from speech signal becomes a challenging research topic known as Speech Emotion Recognition (SER). In this study, we propose a Persian em...
متن کاملSpeaker Adaptation in Continuous Speech Recognition Using MLLR-Based MAP Estimation
A variety of methods are used for speaker adaptation in speech recognition. In some techniques, such as MAP estimation, only the models with available training data are updated. Hence, large amounts of training data are required in order to have significant recognition improvements. In some others, such as MLLR, where several general transformations are applied to model clusters, the results ar...
متن کاملSpeaker Adaptation in Continuous Speech Recognition Using MLLR-Based MAP Estimation
A variety of methods are used for speaker adaptation in speech recognition. In some techniques, such as MAP estimation, only the models with available training data are updated. Hence, large amounts of training data are required in order to have significant recognition improvements. In some others, such as MLLR, where several general transformations are applied to model clusters, the results ar...
متن کاملAllophone-based acoustic modeling for Persian phoneme recognition
Phoneme recognition is one of the fundamental phases of automatic speech recognition. Coarticulation which refers to the integration of sounds, is one of the important obstacles in phoneme recognition. In other words, each phone is influenced and changed by the characteristics of its neighbor phones, and coarticulation is responsible for most of these changes. The idea of modeling the effects o...
متن کامل